Interpreting Time in Text Summarizing Text with Time

نویسنده

  • JUN-PING NG
چکیده

In this thesis, I study two key steps in building a logical representation of temporal information — a timeline — found within text from newswire articles: 1) intra-sentence event-timex (E-T ) temporal relationship classification, and 2) article-wide event-event (E-E ) temporal relationship classification. Events and time expressions (timexes) are basic units of temporal information in text. These two steps allow us to build an understanding of the relative ordering between these basic temporal units. For both of these classification tasks, I propose more semantically motivated features, namely the use of typed dependency parses and discourse analyses, to achieve better classification performance. This is in contrast to much work in the existing literature, which have focused on lexicosyntactic features. Working on E-T temporal relationship classification, I also show that crowdsourcing is a very cost-effective and viable avenue through which a high-quality temporal corpus can be built. Making use of the structure of a sentence, I propose a unique way to identify instances which are computationally and cognitively easier. Excluding these instances from a corpus does not degrade subsequent classifier performance significantly. This allows cost savings of up to 37% when building a E-T temporal corpus. Besides putting together a state-of-the-art temporal processing system, this thesis also validates the efficacy and utility of the timelines that are automatically derived. Temporal information from these timelines is incorporated into a competitive baseline multi-document summarization system. I propose several features derived from timelines and show that they lead to a 4.1% improvement in summarization performance. I also introduce a modification to the traditional Maximal Marginal Relevance (MMR) algorithm, TimeMMR. TimeMMR is shown to be useful in the summarization of some document sets. To further improve the performance gains derived from the use of temporal information, I propose a reliability filtering metric which gauges how accurate and useful a timeline is. By selectively making use of timelines guided by this reliability filtering metric, overall summarization performance is increased by a statistically significant 5.9%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

روش‌های حلّ تعارضات بدوی اخباردر مصابیح ‌الانوار

Sayyid Abdollāh Shubbar in Masābih al-Anwār fi Hal Mushkilāt al-Akhbār argues that the reason of the difficulty of the meaning of some narrations is due to their incompatibility with others and he, therefore, endeavoures to explain away this incompatibility. Having mentioned different views on solving the incompatibility Shubbar sometimes explains the preponderant view, but sometimes mentions t...

متن کامل

The Identity of Moses in Surah Al-Qasas with Reference to Time and Space

The question of identity in a narrative text is one of the most influential questions that need further study. The variations in the factors that may affect the concept of identity add to the complexity of the narrative text. The study aims at analyzing the main phases, stages, themes and events of Moses’ life story as part of the narrative discourse. The effects of time and place on the main e...

متن کامل

Benefits of sign language interpreting and text alternatives for deaf students' classroom learning.

Four experiments examined the utility of real-time text in supporting deaf students' learning from lectures in postsecondary (Experiments 1 and 2) and secondary classrooms (Experiments 3 and 4). Experiment 1 compared the effects on learning of sign language interpreting, real-time text (C-Print), and both. Real-time text alone led to significantly higher performance by deaf students than the ot...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014